\section{Experiment}

In this section, we conduct extensive experiments to evaluate the effectiveness of our proposed approach.

% 6.1
\subsection{Dataset \& Preprocessing}

由于调用openai的成本高昂，对每一个数据集，本文按照问题类型对原数据集进行了采样，构建了新的数据集。
我们认为，将复杂问题解析为sparql是db-specific的，因此，本文从db的视角出发，构建了3个数据集，分别对应3个db，分别是Freebase, Wikidata, MetaQA。

\paragraph{Freebase Based Datasets} 
WebQuestionsSP \cite{yih_value_2016} and ComplexWebQuestions 1.1 \cite{talmor_web_2018} are widely used question answering datasets that contain natural language questions and corresponding SPARQL queries base on Freebase. WebQuestionsSP was collected by querying Google Suggest API with the question as the prefix, and then manually annotating the SPARQL query. 原作并没有对问题进行分类，因此，本文使用根据的infer chain（从topic entity到答案节点的路径）的长度，将问题分为==1和==2的两类。（引用？记得有文章是这么分类的）
ComplexWebQuestions 1.1 is an extension of WebQuestionsSP, which contains 4 typpes of complex questions: composition, conjunction, superlative, and comparative. We follow the origin classification, 但是我们对每一类问题进行了均匀采样。

% 数据集的难点特点是这里提还是在实验中提？
\paragraph{Wikidata Based Datasets} KQAPro
KQA Pro \cite{} is a large-scale dataset of complex question answering over a dense set of wikidata. It contains 9 types of complex questions, including Count, QueryAttr, QueryAttrQualifier, QueryName, QueryRelation, QueryRelationQualifier, SelectAmong, SelectBetween and Verify. We also sample a subset following the origin classification.

\paragraph{MetaQA}
MetaQA \cite{} consists of 3 types of questions, which is 1-hop, 2-hop and 3-hop. The knowledge base is built from the WikiMovies dataset. We also sample a subset following the origin classification.
此外，我们将meta的工作

MetaQA (Zhang et al., 2017) consists of a movie ontology derived from the WikiMovies Dataset and three sets of question-answer pairs written in different levels of difficulty. It evaluates the effectiveness in a specific domain.

% 是否要列出原数据集(train/dev分别统计)中每一类问题的分布？以此凸显本工作关注于不同类型的问题？

\subsection{Baselines}

由于本文采样并构建了新的数据集，因此，我们对前人的方法进行了复现，以便公平比较。

% 6.2
\subsection{Evaluation Metrics}



\subsection{Implementation Details}





\subsection{Results}

主实验结果


对话轮次与价格


entity linking的影响





\subsection{Case Study}

openai

llm

\subsection{Error Analysis}

openai

llm
